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We design monitor optimisations for detectEr, a runtime-verification tool synthesising systems of 
concurrent monitors from correctness properties for Erlang programs. We implement these optimi¬ 
sations as part of the existing tool and show that they yield considerably lower runtime overheads 
when compared to the unoptimised monitor synthesis. 

1 Introduction 

Runtime Verification (RV) m is a lightweight verification technique mitigating the scalability issues 
associated with exhaustive verification techniques such as model checking. Low overheads are an im¬ 
portant requirement for the viability of any RV framework, where the additional computation introduced 
by the monitors should ideally be kept to a minimum. 

detectErO H is an RV tool for analysing the correctness of concurrent Erlang programs — the 
analysis of concurrent programs is notoriously hard and often leads to state explosion problems. From 
a safety correctness property (defined through a formal logic), detectEr generates a system of monitors 
that execute concurrently with the system under scrutiny, analysing its execution trace, and raising an 
alert as soon as a violation to the resp. correctness property is detected. In Q, it is shown that the mon¬ 
itors generated by the tool are indeed correct (e.g., they only raise an alert when the system violates the 
resp. property) whereas in (Si the authors study the relationship between synchronous and asynchronous 
instrumentation in this setting, establishing (amongst other things) that asynchronous monitoring consis¬ 
tently yields the lowest level of overheads. 

In this paper we study optimisation techniques for further lowering the overheads of the tool’s asyn¬ 
chronous monitors|^ The monitor synthesis defined in El uses concurrent monitors to parallelise the 
runtime analysis as much as possible and exploit better the underlying hardware architecture (which 
nowadays typically includes multiple computing cores). However, in order to simplify the correctness 
proofs, this synthesis is kept as regular as possible: the monitor-generation strategy is the same for every 
logical construct and does not take into consideration the syntactic context of where that logical construct 
appears in the correctness property. Moreover, the communication organisation of the generated concur¬ 
rent monitors is also kept static throughout the execution of the program, even though certain monitor 
subsystems become redundant during the runtime analysis. In this work we address these two potential 
sources of inefficiency by defining fine-tuned organisations of concurrent monitors specifically tailored 
to different forms of logical formulas; in addition, these monitors are also able to perform a degree of 
reconfiguration during the runtime analysis. We incorporate the new strategies into the existing tool and 
show that the generated monitors produce lower overheads than the existing monitor translations. 

The rest of the paper is structured as follows. § [^introduces the tool whereas § [^ identifies ineffi¬ 
ciencies and proposes solutions. § [^discusses performance improvements and § [^concludes. 

*The research work disclosed in this publication is partially funded by the Master it! Scholarship Scheme (Malta). 

^ These optimisations may also be extended to synchronous monitors and should also yield lower overheads to that setting. 
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Figure 1: The Logic and its Semantics 


2 detectEr Primer 

The Logic: Correctness properties in detectEr are expressed through the logic sHML [T| (a syntactic 
subset of the modal yu-calculus). The sHML syntax is defined inductively using the BNF in Fig.[^ It is 
parametrised by a set of boolean expressions, b,ce Bool, equipped with a decidable evaluation function, 
Z? IIV where v e {true, false}, and a set of actions a,j3 e Act that may universally quantify over data values. 
It assumes two distinct denumerable sets of term variables, x,y,... g Var (used in actions and boolean 
expressions) mdformula variables X,Y,...e LVar, used to define recursive (logical) formulas]^ Formulas 
include truth and falsity, tt and ff, conjunctions, cp & if/, modal necessities, [a^ip, maximal fixpoints 
to describe recursive properties, maxX.^, and conditionals to reason about data, if Z? then^ else if/. A 
necessity formula, [a] ip, may contain term variables in a that pattern-match with actual (closed) actions, 
thus acting as a binder for these variables in the subformula similarly max X. ^ is a binder for X in ip. 

The semantics of the logic is defined for closed formulas, over Erlang programs interpreted as a 
Labelled Transition Systems (LTSs), as shown in Q. In our case, an LTS takes the form (Actr, Act U 
{r},^), where A,B e Actr are nodes denoting actor systems, ActU (rj are actions including a silent 

Cf 

(internal) action r, and ^ is a ternary relation of type Actr x (Act U (rj) x Actr; we write A —> B in lieu of 

Cf T Gf T I I 

(A,a,B) and use A=^ Bio denote A{ —>)*-> •(— >yB. The semantics is given in Fig.[]Jand follows 

that of lEl- No actor system satisfies ff, whereas all actors satisfying tt. Actors satisfying cp & if/ must 
satisfy both ip and if/. Necessity formulas [or] ip are satisfied by all actor systems A observing the condition 
that, whenever pattern-matchable actions y3 are performed (yielding substitution cr :: Var ^ Val), the 
resulting actors B that are transitioned to must satisfy ipcr. Note that actors that do not perform any 
pattern-matchable actions trivially satisfy [a]^. Formula maxX.^ denotes the maximal fixpoint of the 
functional [[^J and allow the logic to be defined over actors with infinite behaviour; following standard 
fixpoint theory llT3l . this is characterised as the union of all post-fixpoints S g !P(Actr) (in Fig.[^ (X i-^ 5} 
denotes the substitution of S for X). Finally, a conditional, if Z? then ip else if/, equates to ip whenever 
b evaluates to true and to if/ when b evaluates to false. 

Example 2.1. Consider an Erlang system implementing a predecessor server receiving messages of the 
form (n, clientID) and returning n -I back to clientlD whenever n >0, but reporting the offending client 
to an error handler, err, whenever n -Q. It may also announce termination of service by sending a 

^Although we here work up-to cr-conversion, detectEr accordingly renames duplicate variables during pre-processing. 
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message to end. A safety correctness property in our logic would be: 


maxX. [srv?{x,y}] 


([end!_]ff) & ([err!z](if (x^OVy then ff else X)) 
& ([y !z](if z = (x-1) then X else ff)) 


( 1 ) 


It is a recursive formula, max X. (...), stating that, whenever the server implementation receives a request 
(input action), [srv?{x,y}]..., with value x and return (client) address y, then it should not: 

1. terminate the service (before handing the client request), [end! _ ] ff. 

2. report an error, [err! z]..., when x is not 0, or with a client other than the offending one, yi^z. 

3. service the client request, [y! z]..., with a value other than x - 1. 

These conditions are invariant, maximal fixpoints capture this invariance for server implementations that 
may never terminate (they are considered correct as long as the conditions above are not violated). ■ 


The Monitor Synthesis: The synthesis algorithm of Q aims to be modular: it generates indepen¬ 
dently executing monitor combinators for each logical construct, interacting with one another through 
message passing^ For instance, the synthesis parallelises the analysis of the subformulas ipi and ip 2 
in a conjunction ipi^ip 2 by (/) synthesising concurrent monitor systems for ipi and ip 2 resp. and (//) 
creating a conjunction monitor combinator that receives trace events and forks (forwards) them to the 
independently-executing monitors of (pi and ^2- Since the submonitor systems for (p\ and cp 2 may be 
arbitrarily complex (needing to analyse a stream of events before reaching a verdict), the conjunction 
monitor is permanent in the monitor organisation generated, so as to fork and forward event streams of 
arbitrary length. It is also worth noting that the synthesis algorithm assumes formulas to be guarded, 
where recursive variables appear under necessity formulas; this is required to generate monitors that 
implement lazy unrolling of recursive formulas, thereby minimizing overheads (see (Tl for details). 

Example 2.2. From formula Q, the monitor organisation m (depicted in Fig.[^ is generated, consisting 
of one process acting as the combinator for the necessity formula [srv?{x,y}]^. If an event of the form 
srv?{v,c} is received, the process pattern matches it with srv?{x,y} (mapping variables x and y to the 
values V and c resp.) and spawns the (dashed) monitor system shown underneath it in Fig.[^ instantiating 
the variables x,y with v,c resp. The subsystem consisting of three monitor subsystems, one for each 
subformula guarded by [srv?{x,y}] in connected by two conjunction forking monitors. When the 
next trace event is received, e.g., c!v-l (a server reply to client c with value v- 1), the conjunction 
monitors replicate and forward this event to the three monitor subtrees. Two of these subtrees do not 
pattern match this event and terminate; the third subtree (submonitor m2) pattern-matches it however, 
instantiating z for (v- 1), evaluating the conditional and unfolding the recursive variable X to monitor 
m. If another server request event is received, srv?{v',c'} (with potentially different client and value 
arguments d and v'), the conjunction monitors forward it to m, pattern matching it and generating a 
subsystem with two further conjunction combinators as before. ■ 


3 Optimizations 

Formula Q is a pathological example, highlighting two inefficiencies introduced by the synthesis algo¬ 
rithm of § [^ First, conjunction monitors mirror closely their syntactic counterpart and can only handle 


^Every combinator is implemented as a (lightweight) Erlang process (actor) |4), uniquely identihable by its process ID. 
Messages sent to a process are received in its dedicated mailbox, and may be read selectively using (standard) pattern-matching. 
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Figure 2: Monitor Generation for Formula ^ audits execution wrt. trace srv?{v,c}.c! (v- l).srv?{v',c'} 


forwarding to two sub-monitors. As a result, formulas with nested conjunctions — as in formula Q 
— translate into organisations of cascading conjunction monitors that are inefficient at forwarding trace 
events. For instance, the two cascading conjunction monitors of m in Fig. [^replicate the trace event c! v- 
1 four times in order to forward it to three sub-monitor systems; the problem is accentuated for higher 
numbers of nested conjunctions and repeated forwarding. 

Second, the current monitor implementation does not perform any monitor reorganisations at run¬ 
time. When a conjunction formula &^2 is translated, the conjunction combinator monitor organisation 
is kept permanent throughout its execution because it is assumed that the resp. sub-monitors for ipi and 
ip 2 are long-lived. This heuristic however does not apply in the case of formula Q, where two out of the 
three sub-monitors terminate after a single event is received. This feature, in conjunction with recursive 
unfolding, creates chains of indirections through conjunction monitors with only one child, as shown in 
Fig. 1^ (bottom row). 


Proposed Solutions: The first optimisation we introduce is that of conjunction monitor combinators 
that fork-out to an arbitrary number of monitor subsystems. For instance, the corresponding monitor 
formula Q would translate into a monitor organisation consisting of one conjunction combinator with 
three children (instead of two combinators with two children each) as shown in Fig.|^left). This is more 
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Figure 3: Optimised Synthesis for Formula and its execution wrt srv?{v,c}.c! (v- l).srv?{v',c'} 


efficient from from the point of view of processes created, but also in terms of the number of replicated 
messages required to perform the necessary event forwarding to monitor subsystems. For example, the 
conjunction combinator of Fig. generates three message replications to forward an event to the three 
sub-monitors (as opposed to iht four messages of Fig. as discussed earlier). 

The second optimisation considered is that of allowing conjunction monitor combinators to dynami¬ 
cally reorganise the monitor configuration so as to keep the event flow structure as efficient as possible. 
In order to keep overheads low, this reconfiguration operation should be kept local, where unaffected 
monitor subsystems should continue with their runtime analysis while the restructuring is in process. 
Stated otherwise, the monitor reorganisation happens while trace events are still being received, and the 
operation needs to guarantee that (/) no trace events are lost (//) trace events are not reordered. 

Reorganisations are carried out by conjunction combinators, which are now allowed to add and delete 
monitor subsystems from their internal list of children. For instance, when an event causes a child sub¬ 
monitor to terminate, the parent (conjunction) monitor is sent a termination message which allows it to 
remove the terminated sub-monitor from its child-list. 

The reconfiguration protocol is kept local (i.e., other parts of the monitor graph are unaffected), and 
is carried by two (multi-child) conjunction combinators in a parent-child setup. It proceeds as follows: 

1. When an event causes a child sub-monitor to become a system with a conjunction combinator at 
its root, it sends a merge-request to its parent. In the meantime the child sub-monitor may start 
receiving events from its parent and forwards them to its children. 

2. When parent conjunction combinator reads the merge request, it sends a merge-ack back to child 
monitor and waits for a merge-msg from this child; while waiting for this merge message, the 
parent monitor stops retrieving further trace events from its mailbox, effectively using it as a buffer 
for future events that may keep on being received. 

3. As soon as the child monitor receives the merge-ack message, it forwards all the remaining events 
in its mailbox to its children. Once it empties its mailbox, it sends a merge-msg back to the parent 
with a list of its children sub-monitors and waits for a merge-final message. 

4. Upon receiving merge-msg the parent removes the child sub-monitor sending the message and, 
instead, adds the sub-monitors sent by this child to its own list. It then sends a merge-final to the 
child monitor and waits for a merge-complete message. 

5. When the child receives merge-final, it retrieves any possible merge requests sent by its former 
children, forwards them in order to its parent, followed by a merge-complete message, and termi¬ 
nates. 
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Figure 4: Evaluation results 


6. When the parent receives merge-complete, it reverts back to its normal operation of trace forward¬ 
ing. 

The monitor restructuring protocol discussed yields monitor organisations with only one (eventual) 
conjunction node at the root, and a list of monitor subsystems processing the forwarded events (a spider¬ 
like configuration). For instance, for the event trace srv?{v,c}.c! (v - l).srv?{v',c'}, the synthesised 
monitor for formula ([T]) yields the evolution shown in Fig.|^ 


4 Evaluation 

Through a series of empirical tests, we verify whether the monitor optimisations of §|^yield the expected 
overhead improvements. In particular, such gains are not obvious for the second optimisation presented 
in § [^ where reconfigurations introduce additional computation that may offset the lower overheads 
obtained from addressing the inefficiencies of redundant monitor code discussed in § [^ 
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The tests are carried out on a third-party commercial software called Yaws Q, an HTTP Webserver 
written in Erlang. In order to keep high levels of throughput, the Yaws server assigns a dedicated (con¬ 
current) handler servicing HTTP requests for every client connection, thereby parallelising processing 
for multiple clients. The evaluation is based on a variety of safety properties for the Yaws server imple¬ 
mentation, expressed in terms of the logic discussed in § [^ The tests carried out employ three synthesis 
algorithms to obtain monitors from these properties, namely (/) the unoptimised synthesis (presented in 
|17|), (//) a monitor synthesis employing multi-child conjunction combinators (without reorganisation) 
and (Hi) a synthesis employing both optimisations of § [^ (including dynamic reorganisations); these are 
compared to baseline readings, i.e., the unmonitored system. 

The tests measure the respective overheads for these three synthesis algorithms, and compare the 
respective overheads induced wrt the baseline (unmonitored) Yaws execution for varying client loads. 
Overheads are calculated in terms of (/) the average CPU utilization; (//) the memory overhead per client 
request; and (Hi) the average time taken for the server to respond to batches of simultaneous client 
request. The experiments are carried out on an Intel Core 2 Duo T6600 processor with 4GB of RAM, 
running Microsoft Windows 7 and EVM version R16B03. Eor each property and each client load, we 
take the average of three sets of readings. Since results did not yield substantial variations for the different 
properties synthesised, we present averaged readings across all properties in the graphs shown in Eig.|^ 

The results show that just using multi-child conjunction combinators yield modest yet consistent 
gains in terms of CPU usage, memory consumption and average response times, when compared with 
the two-pronged conjunction combinator of |!7*|. However, such monitors still appear to induce non¬ 
linear overhead increases, probably created by the chains of monitor indirections created for recursive 
formula unfolding (see discussion in §[^. This problem however seems to be rectified for monitors with 
dynamic reorganisations, as can be seen from the graphs in Pig.|^ In particular, overheads appear to be 
comparable to the baseline execution for memory consumption. 


5 Conclusion 

We present monitor optimisations for detectEr, an RV tool synthesising concurrent monitors for Erlang 
correctness properties. We implement these optimisations as part of the existing tool and demonstrate 
that they yield considerably lower runtime overheads when compared to the original monitor synthesis 
presented in (Tl. We conjecture that similar overhead improvements should be observed if these optimi¬ 
sation techniques are applied to the synchronous monitoring studied in Q. 

Related Work: Several verification and modeling tools |[T2l [S] [m for actor-based component sys¬ 
tems already exist. Rebeca ifT^ is an actor-based modeling language providing automated translation to 
renowned model checkers like SMV and Promela; timed-rebeca models have also been translated into 
Erlang. McErlang O is a model-checker specifically targeting Erlang code that uses a superset of our 
logic; to the best of our knowledge the tool does not consider any verification post-deployment, as in the 
case of RV. As far as we are aware, eLARVA O is the only other RV tool for Erlang programs. Similar 
to the setup studied in this work, it synthesises monitors that use the Erlang Virtual Machine tracing 
mechanism to obtain system trace events in asynchronous fashion. Apart from the source logic used 
(eLARVA properties are described as automata-based specifications), a key difference between this tool 
and detectEr is that oLarva produces monolithic monitors, as opposed to the component based monitor 
systems described in this paper; as a result, the optimisation techniques discussed do not apply. In ifTTll . 
Sen et al. explore a decentralized (choreographed) monitoring approach as a way to reduce the com- 
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munication overheads that are usually caused by a centralized approach and implement it in terms of an 
actor-based tool called DiAna. It would be interesting to explore to what extent the optimisations pre¬ 
sented in this work can be extended to the distributed setting of DiAna, and whether these optimisations 
would yield comparable overhead gains. Our techniques may also be relevant to lower overheads in other 
component-based monitor synthesis algorithms such as in O (which has a fixed monitor organisation) 
or in m (which supports a level of dynamic reorganisation as updatable distributed tables). Similar 
investigations could also be carried out for the distributed monitoring approaches studied in ||6l. 
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